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Abstract 


There  are  two  common  misconceptions  in  the  analysis  of  trending  time  series.  First,  if  a  series  is 
difference-stationary,  removing  a  linear  time  trend  introduces  spurious  cyclically.  Second,  regard- 
less of  whether  a  series  is  difference-stationary  or  trend-stationary,  taking  Erst  differences  produces, 
in  either  case,  a  covariance  stationary  sequence,  and  bo  is  recommended  econometric  practice.  We 
show  that  the  Bret  statement  is  incorrect  and  that  the  second  can  be  misleading. 

1.  Introduction. 

A  number  of  recent  papers  have  recommended  taking  first-differences  of  observed  time  series  prior  to  econo- 
metric analysis  (see  for  example  Campbell  and  Mankiw  (1988)  and  others).  The  reasoning  is  as  follows.  If 
a  series  is  truly  difference-stationary,  then  removing  a  linear  time-trend  produces  spurious  cyclicality  in  the 
residuals.  Under  the  same  condition,  taking  first  differences  produces  a  series  that  is  covariance  stationary, 
and  so  is  convenient  for  econometric  analysis.  If,  on  the  other  hand,  the  series  is  truly  trend-stationary, 
taking  first-differences  nevertheless  produces  a  covariance  stationary  series,  albeit  one  with  a  zero  in  the 
spectral  density  at  frequency  zero.  This  is  still  satisfactory  however  (the  reasoning  goes),  as  the  unit  root  in 
the  moving  average  part  produced  by  over-differencing  will  manifest  in  the  final  estimates.  Thus,  if  one  is  to 
remain  agnostic  as  to  the  cyclicality  of  the  observed  time  series,  the  recommended  practice  is  always  to  take 
first-differences  prior  to  econometric  analysis.  Put  another  way,  the  recommended  practice  is  not  to  detrend 
for  detrending  leads  to  spurious  "cyclical*  behavior  in  the  residuals.  This  view  has  become  quite  widely  held 
by  many  macroeconomists.  See  for  example  Campbell  (1987),  Deaton  (1986),  Nelson  (1987),  Nelson  and 
Kang  (1981,  1984),  Mankiw  and  Shapiro  (1985),  Romer  (1987),  and  Shapiro  (1986)  among  others. 

Nelson  and  Kang  (1981)  and  Nelson  and  Plosser  (1982)  have  forcefully  argued  that  least  squares  detrend- 
ing of  a  unit  root  process  produces  spurious  cyclicality.  It  is  known  that  when  the  data  are  trend-stationary, 
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OLS  estimators  of  the  intercept  and  time  trend  coefficients  converge  to  their  probability  limits  quite  rapidly. 
Thus  there  is  good  reason  to  believe  that  in  this  case,  the  detrended  data  appropriately  reveal  the  true 
underlying  dynamics  about  trend.  In  addition,  Durlauf  and  Phillips  (1986)  have  shown  that  when  the  data 
are  difference-stationary,  the  OLS  estimator  for  the  time  trend  coefficient,  while  converging  in  probability, 
does  so  much  more  slowly  relative  to  that  in  the  trend-stationary  case.  Further,  the  OLS  estimator  for  the 
intercept  in  this  case  actually  diverges.  Thus  is  is  not  surprising  that  the  "spurious  cyclically"  view  is  so 

prevalent. 

i 
This  paper  demonstrates  that  this  reasoning  is  fundamentally  incorrect.    First,  we  demonstrate  that 

removing  a  linear  time  trend  actually  preserves  the  true  stochastic  characteristics  of  the  data.  We  show 
that  data  detrended  by  least  squares  regression  asymptotically  provide  the  correct  picture  of  the  underlying 
dynamics,  independent  of  whether  the  data  are  truJy  trend-  or  difference-stationary.  More  precisely,  we 
establish  two  results  here:  First,  if  the  data  are  trend-stationary,  the  covariogram  estimator  using  detrended 
data  is  asymptotically  indistinguishable  from  that  using  the  true  unobserved  fluctuations  about  trend.  Con- 
sequently, the  same  is  true  of  the  correlogram  estimator  in  this  case.  Second,  suppose  instead  the  data  are 
difference-stationary,  and  a  researcher  detrends  the  data  by  least  squares.  We  show  then  that  the  correlo- 
gram estimator  at  each  lag  converges  in  probability  to  1,  which  correctly  indicates  the  presence  of  a  unit 
root.  In  other  words,  the  researcher  will  appropriately  conclude  that  the  residuals  are  a  unit  root  process, 
and  are  not  cyclical  about  trend. 

These  statements  however  are  asymptotic;  it  may  still  be  the  case  that  in  finite  samples,  measures  of 
dynamic  correlation  subsequent  to  least  squares  detrending  are  severely  misleading.  To  analyze  this,  we  take 
as  starting  point  the  findings  of  Kelson  and  Kang  (1981)  on  the  finite  sample  bias  due  to  using  detrended 
residuals  when  the  data  are  truly  difference  stationary.  We  show  that  using  detrended  residuals,  the  exact  bias 
in  the  covariogram  estimator  is  of  approximately  the  same  magnitude,  regardless  of  whether  the  underlying 
process  is  trend-stationary  or  difference-stationary.  With  finite  samples,  any  reasonable  transformation  of 
the  estimated  covariogram,  such  as  the  estimator  for  the  spectral  density,  will  have  approximately  the  same 
bias  properties,  again  regardless  of  whether  the  data  are  truly  trend-  or  difference-stationary.  Thus  the  bias 
indicated  by  Nelson  and  Kang  (1981)  is  irreducible  even  in  the  "best  practice"  case  of  using  least  squares  to 
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detrend  trend-stationary  data.  We  conclude  that  emphasizing  bias  due  to  detrending  a  difference  stationary 
process  is  misguided:  there  is  a  significant  finite  sample  bias  associated  with  least  squares  detrending,  period. 

Clearly,  if  the  data  are  difference-stationary,  it  is  a  misspecification  to  estimate  a  time  trend,  and  then  to 
interpret  the  residuals  as  being  close  to  a  stationary  process.  We  agree  that  estimating  a  correctly  specified 
model  is  better  than  estimating  an  incorrectly  specified  model.  However,  unless  its  proponents  wish  to  extend 
the  case  that  least  squares  detrending  produces  misleading  results  to  when  the  data  are  trend-stationary  as 
well,  we  do  not  find  convincing  the  argument  that  detrending  produces  spurious  cyclically. 

Next,  we  show  that  taking  first-differences  is  treacherous  for  subsequent  econometric  analysis,  should 
the  true  data  generating  process  actually  be  trend-stationary.  We  illustrate  this  for  three  different  statistical 
procedures:  estimating  the  mean  of  such  a  transformed  process,  estimating  the  moving  average  coefficient, 
and  finally,  estimating  causality  patterns  when  one  of  the  variables  is  such  an  over-differenced  process. 

In  sum,  our  analysis  almost  exactly  overturns  the  conclusions  indicated  in  the  introductory  paragraph. 
We  contend  with  the  suggestion  that  macroeconomists  should  always  use  first-differenced  series  so  as  not  to 
pre-judge  the  cyclically  in  the  data:  we  strongly  disagree  with  this  view. 

The  remainder  of  the  paper  is  organized  as  follows.  Section  2  considers  the  effects  of  removing  a  fixed 
but  arbitrary  linear  time  trend  from  a  difference-stationary  process.  The  conclusion  here  is  obvious:  the 
deviations  from  any  fixed  linear  time  trend  remain  difference-stationary.  Section  3  considers  the  effects  of 
first-differencing  a  trend-stationary  process.  The  results  here  are  less  benign:  this  introduction  of  a  stochastic 
singularity  manifests  in  the  spectral  density  vanishing  at  frequency  zero.  This  is  shown  to  cause  problems 
for  a  number  of  inferential  questions  of  interest. 

In  Sections  2  and  3,  we  treat  the  artificial  but  usefully  intuitive  case  where  by  detrending  we  mean 
removal  of  a  fixed  although  completely  arbitrary  linear  time  trend.  Section  4  considers  the  relevant  practical 
situation  when  detrending  is  performed  by  least  squares  regression.  We  show  here  that  our  earlier  conclusions 
are  unmodified:  regardless  of  whether  the  true  data  generating  process  is  trend-  or  difference-stationary,  the 
correlogram  estimators  using  least  squares  detrended  data  are  consistent  asymptotically,  and  are  "similarly'' 
behaved  in  finite  samples.  Section  5  concludes  the  paper. 
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2.  Difference  Stationary  Data  Generating  Process. 

Suppose  that  the  underlying  data  generating  mechanism  is  difference-stationary: 

(i)      Yt  =  fa  +  Yt-i  +  «t,     «>1, 
(ii)       Yo  a  given  random  variable,  and 
(iii)       u<  covariance  stationary  with  mean  zero  and  spectral  density  bounded  away  from  zero. 


Iterating  on    (i)  ,  we  have  that: 


3=1 


A  linear  time  trend  specification  is  a  pair  of  real  numbers  (qj,  0x).  Removing  this  linear  time  trend  from  Yt 

results  in: 

Xt  =  Yt  —  <*i  —  0\  •  t 

t 
«*.  Xt  =  [Y0  -  a,)  +  (Po-Pi)-t  +  Yl  «i- 

]=i 

The  "detrended  residuals"  Xt  have  the  integrated  form  of  a  difference-stationary  sequence.    In  particular, 

X-t  can  be  written  as: 

(a.)     Xt  =  {0o  ~  0i)  +  *t-i  +  «t,      *  >  1, 

(i.)     Xq  a  given  random  variable,  Yo  —  Qi, 

(c.)  the  same  as    (iii)    above. 
Thus,  the  detrended  residuals  are  another  difference-stationary  sequence,  where  the  first-difference  sequence 

Xt  —  Xt-i  has  exactly  the  same  probability  properties  as  the  first-difference  of  the  original  sequence,  Yt  —  Yt-i 

(except  possibly  in  mean). 

Therefore,  unless  one  supposes  that  the  original  series  already  has  a  tendency  to  show  "spurious  cycli- 
cally,'' there  is  absolutely  no  reason  to  draw  that  conclusion  for  the  detrended  series. 

We  return  in  Section  4  below  to  this  observation  that  detrending  does  not  alter  the  dynamic  properties 
of  the  data. 
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S.  Trend  Stationary  Data  Generating  Process. 

Suppose  now  that  the  true  data  generating  mechanism  is  trend-stationary: 

(i)       Yt  =  a0  +  p0-t+uu 
(ii)        ut  covariance  stationary  with  mean  zero  and  spectral  density  bounded  away  from  zero. 

As  before,  a  linear  time  trend  is  a  pair  of  real  numbers  (ai,5i).    Removing  a  time  trend  therefore 

produces: 

X\  =Yt-ai-h-t 

=>  X]  =  (oo  -  01)  +  (ft,  -fa)-t  +  ut. 
The  resulting  process  A',1  is  seen  to  be  trend-stationary,  with  exactly  the  same  dynamic  stochastic  properties 

as  in  the  original  data  Y  (in  u).   In  the  special  case,  where  qj  and  ft  are  equal  to  qo  and  ft),  tn«  result 

is  covariance  stationarity,  which  is  simply  a  special  case  of  trend-stationarity.    Comparing  this  with  the 

conclusion  from  Section  2,  we  note:    Removing  a  fixed  but  arbitrary  linear  time  trend  preserves  the  true 

properties  of  the  data,  reg-ardJess  o/wietiier  tJje  underlying  model  is  difference  stationary  or  trend  stationary. 

Hence  linear  de-trending  actually  has  the  very  desirable  property  that  it  does  not  distort  the  true  features 

of  the  data,  contrary  to  numerous  assertions  that  the  opposite  is  true. 

Next  consider  taking  first  differences  of  trend-stationary  data: 

At2  =  r1-rt_1  =  ^o  +  (u1-u1_1). 

Since  ut  is  covariance  stationary,  X?  is  also  covariance  stationary,  although  it  has  a  zero  in  its  spectral 
density  at  frequency  zero.  This  last  characteristic  has  been  noted  by  macroeconomists,  but  its  significance 
has  always  been  relegated  to  the  statement  that  "it  will  show  up  as  a  unit  root  in  the  moving  average  part." 
We  now  point  out  that  instead  its  presence  renders  X7  treacherous  to  analyze  econometrically. 

Inference  about  X?  requires  that  as  a  stochastic  process,  X2  produces  statistics  that  admit  nonde- 
generate  asymptotic  distributions.  Then  the  researcher  can  be  confident  that  a  given  sample  is  actually 
informative  on  a  hypothesis  of  interest.  We  demonstrate  that  X7  being  an  over-differenced  sequence  in  fact 
does  not  have  desirable  properties.  First,  we  show  that  in  the  leading  case  of  expectation  estimation,  such 
an  over-differenced  process  produces  a  degenerate  asymptotic  distribution  for  the  sample  mean  estimator. 
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In  particular,  it  produces  a  statistic  that  is  tantamount  to  using  only  the  first  and  last  data  points  in  es- 
timation. Second,  we  show  that  the  nonlinear  least  squares  estimator  for  the  moving  average  coefficient  in 
an  over-differenced  X2  has  the  same  asymptotic  properties  as  that  for  the  autoregressive  coefficient  in  the 
levels  of  a  difference  stationary  process.  Thus,  if  macroeconomists  are  using  first-differenced  data  because 
they  do  not  wish  to  use  the  nonstandard  distribution  theory  associated  with  unit  root  processes,  they  should 
recognize  that  they  are  faced  with  precisely  the  same  problem  when  they  first-difference  a  process  that  is 
truly  trend-stationary.  Third,  we  show  that  using  an  over-differenced  sequence  A-2  in  a  bivariate  causality 
test  will  produce  spurious  evidence  of  causality,  when  in  fact  the  data  are  actually  not  Granger  causally 
related.  This  last  point  may  be  found  in  Sims  (1972)  as  well,  although  it  does  not  appear  to  be  well-known. 
To  begin,  consider  the  problem  of  estimating  the  mean  of  X7.  For  a  sample  of  size  n,  the  sample  mean 
is: 

n"1  J2  *?  =  Po  +  n~ l  (u„  -  u0). 

t=i 

Notice  that  only  two  (covariance  stationary)  random  variables  u„  and  u0  enter  the  calculation  of  the  sample 
mean  statistic.  Thus  the  sample  mean  converges  to  po,  but  at  a  rate  that  does  not  allow  a  nondegenerate 
asymptotic  distribution  for  (any  normalized  version  of)  the  statistic.  In  words,  after  first-differencing  a  trend- 
stationary  sequence,  the  sample  mean  and  associated  statistics  are  econometrically  useless.  Econometric 
techniques  that  rely  in  part  on  a  central  limit  property  for  the  sample  mean  will  consequently  not  allow  any 
valid  statistical  inference.  Thus,  we  see  that  taking  first  differences,  when  the  true  data  generating  process  is 
actually  trend  stationary;  will  produce  data  that  may  not  be  particularly  informative  for  statistical  inference. 
For  our  second  point,  suppose  for  simplicity  that  /?0  is  0,  and  that  uq  is  0.    Consider  estimating  the 

model: 

X?  =  ut  --yout-i,      r=l,  2,... 

«o  =  0,      7o  =  I- 
For  parameter  7,  define  the  residual  function: 

Rod)  =  o, 

Rth)  =  X?  +  7^-1(7),     t-1,2,.... 
Thus,  the  true  disturbance  tit  is  obtained  as  i?t(l)  for  all  t  >  1.  Estimation  by  nonlinear  least  squares  solves 


the  problem: 


min^i  JZt(7)2. 


i    <-*  2 


To  obtain  the  asymptotic  properties  of  such  an  estimator,  it  is  convenient  to  study  the  asymptotic  distribution 

of  the  score.   For  the  t  —  th  term,  the  score  is  st{i)  =  Rt{l)  '  (&  R*[~l)  I  d  7) .    But  from  above,  the  random 

sequence  8Rt(^)/d'y  at  the  true  parameter  value  70  =  1  is  itself  a  unit  root  process,  with  first  difference 

equal  to  U(_i.  Then  it  follows  from  the  initial  condition  dRo(l)/di  =  0,  that  the  score  at  the  true  parameter 

fo  behaves  as:  i 

*i(l)  =  ux    0=  0 

««(l)  =  ut-^uy     t>2. 

y=i 

Notice  that  the  score  is  the  product  of  a  covariance  stationary  sequence  ut  with  its  accumulation  5H/»i  UJ- 
Clearly,  the  resulting  process  will  not  have  the  usual  central  limit  properties;  thus  neither  will  the  nonlinear 
least  squares  estimator  of  the  coefficient.  Moreover,  the  hessian  itself  can  be  seen  to  have  nonstandard 
asymptotic  properties.  In  fact,  the  score  here  has  exactly  the  same  form  as  the  score  in  least  squares 
estimation  of  the  auto-regressive  coefficient  in  a  difference-stationary  process.1  Thus  (the  score  for)  the 
nonlinear  least  squares  estimator  of  the  moving  average  coefficient  in  an  "over-differenced  process"  converges 
not  to  a  normal  random  variable,  but  instead  to  now-familiar  functional  of  Brownian  motion.  If  the  data 
were  truly  difference  stationary  to  begin,  then  of  course  the  usual  asymptotic  normality  theory  would  apply. 

The  distribution  theory  for  the  non-standard  (first-differenced)  case  is  now  known,  so  it  is  not  that  a 
procedure  taking  this  into  account  is  completely  intractable.  (We  have  not  seen  that  any  economist  using 
first-differenced  data  has  actually  used  this  however.)  Our  point  instead  is  that  the  distribution  theory  that 
applies  to  first-differenced  data  is  different  across  trend  stationary  and  difference  stationary  models.  Thus, 
the  unifying  framework  that  authors  such  as  Campbell  and  Mankiw  (1988)  seem  to  suggest  for  the  use  of 
first- differenced  data  is  simply  non-existent. 

Finally,  we  turn  to  the  use  of  over-differenced  processes  in  Granger  causality  tests  when  the  true  lag 
length  is  unknown  and  instead  is  made  a  function  of  sample  size.  This  last  condition  may  seem  unusual,  but 


1  This  is  related  to  a  familiar  result  from  Lagrange  Multiplier  theory  for  standard  models:  for  instance, 
Godfrey  (1978)  shows  that  Lagrange  Multiplier  tests  against  stationary  autoregressive  and  moving  average 
alternatives  are  identical  when  the  null  is  white  noise;  the  Lagrange  Multiplier  of  course  is  just  the  score. 
That  literature  is  however  not  particularly  concerned  with  unit  root  models. 
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the  reader  should  remember  that  applied  researchers  often  do  have  to  decide  on  a  lag-length  specification  in 
time  domain  work.  They  do  this  based  on  observed  data  (and  hence  its  sample  size),  and  of  course  the  "true" 
lag  length  is  never  known  for  certain.  Consider  two  random  sequences  Z  and  Y,  in  whose  causality  relations 
we  are  interested.  We  will  use  the  notation  X2  to  indicate  a  variable  that  is  the  outcome  of  first-differencing 
Y,  which  is  already  stationary:  thus  X2  has  a  zero  in  its  spectral  density  at  frequency  zero.  Suppose  that 
Z  is  a  covariance  stationary  process  with  spectral  density  bounded  away  from  zero.  From  Sims's  Theorem 
2,  the  causality  relations  between  Z  and  Y  can  be  studied  by  considering  the  two-sided  projection  of  one 
variable  on  current,  lagged  and  future  values  of  the  other.  For  simplicity  assume  that  Y  is  just  white  noise, 
and  that  Z  and  Y  are  in  fact  uncorrected  at  all  leads  and  lags.  Thus  neither  Z  nor  Y  Granger  cause  the 
other:  the  two-sided  projections  are  identically  lero  and  thus  are  trivially  one-sided.  A  researcher  using  Z 
and  Y  will  necessarily  discover  this  for  sufficiently  large  sample  sizes. 
However  consider  instead  the  two-sided  projection  of  Z  on  X2: 

OO 

Zt=    Y,    H3)X?-i  +  Vu      EvtX2.,-  =  0,  for  all  j. 

y=-oo 

The  true  projection  coefficients  b0{j)  again  are  zero.  Equivalently,  the  true  (optimal)  R2  is  zero,  and  least 
squares  estimation  will  attempt  to  achieve  this.  Let  n  denote  sample  size,  and  call  the  sequence  of  fitted 
projections  bn(j),  j  =  —  oo, . . . ,  oo,  n  >  1.  Call  Sx{v)  the  spectral  density  of  X2,  and  let  b  denote  the 
fourier  transform  of  a  (square  summable)  sequence  b.  The  difference  in  mean  squared  error  between  fitted 
projections  (implied  by)  bn  and  the  true  best  fit  (implied  by)  bo  is  given  by  Sims's  approximation  error 
formula  as: 

\dx[bn,  b0)}2  =  ±-  [      \ln[u)  -  toM  Sx{w)  du> 

Z7T    J  ^T 


=  E 


£     bn{j)X2_3. 


Now  consider  the  following  family  of  candidate  fitted  projection  coefficients:  for  n  >  1,  bn(j]  =  n~  \n  —  j\ 
for  |y|  =  0,1,...,  n  and  0  otherwise.  For  each  n,  this  is  a  triangular  shaped  two-sided  lag  distribution 
symmetrir.  about  0:  the  distribution  takes  the  value  1  at  j  =  0,  and  then  declines  as  a  straight  line  to  0 
at  lag  and  lead  j  =  n.    But  when  X?  =  Yt  —  Yt-i  =  ut  —  ut-i  with  u  white  noise,  such  a  family  of  lag 


distributions  implies  a  deterioration  in  mean  squared  error  given  by: 


[dx{ba,bo)r  =  E 


=  E 


H      K{j)[ut-j  -  Vt-i-l] 


i  j=-oo 


bn{-n)ut+n+      ^2     |fc„(y)  -&„(/-  l)]ut_y-fc„(n)u(_n_1 

j=-n+l 


=  Var(u)      £     (Mj)  "Mi- I))2 

y=-n+i 

=  2n_1Var(u)  -Oasn-oo. 
As  Bample  size  increases  to  infinity,  such  a  family  of  lag  distributions  fits  arbitrarily  well  in  the  sense  of  mean 
squared  error.  However,  all  members  of  this  family  are  two-sided:  thus  Granger  causality  statistics  produced 
at  any  sample  size  would  lead  the  researcher  to  conclude  incorrectly  that  Z  Granger  causes  X2. 

This  rather  surprising  result  can  be  understood  in  terms  of  ordinary  least  squares  theory.  The  spectral 
density  of  the  right  hand  side  variable  in  a  distributed  lag  regression  corresponds  to  the  X' X  matrix  in 
ordinary  least  squares  models.  If  the  spectral  density  is  zero  at  some  frequency,  that  is  equivalent  to 
singularity  in  the  X'X.  This  can  be  interpreted  to  mean  that  the  population  regression  is  not  identified.  In 
the  case  of  interest  here,  this  is  exactly  what  happens.  The  lag  distribution  is  not  uniquely  identified:  more 
than  one  distribution  will  fit  the  data  equally  well.  Using  first-differenced  data  therefore  renders  particularly 
subtle  the  interpretation  of  Granger  causality  statistics. 

We  emphasize  that  this  is  fundamentally  different  from  the  usual  prefiltering  effects.  It  is  easy  to  show 
that  with  covariance  stationary  data,  prefiltering  by  arbitrary  one-sided  filters  whose  fourier  transforms  are 
bounded  away  from  zero  leaves  unaltered  patterns  of  Granger  causality.  The  result  here  is  that  prefiltering 
one  of  the  series  by  first-differencing  does  affect  Granger  causality  relations. 

Thus,  contrary  to  the  sanguine  conclusion  that  over-differencing  a  trend-stationary  process  will  simply 
"show  up  as  a  unit  root  in  the  moving  average  part,"  we  conclude  that  this  may  instead  produce  altogether 
unreliable  econometric  results.  It  is  true  that  in  either  case  of  difference  stationarity  or  trend  stationarity,  the 
resulting  process  is  always  covariance  stationary.  However,  the  resulting  process  is  not  necessarily  amenable 
to  econometric  analysis  when  the  data  were  trend  stationary  to  begin.  We  have  presented  three  examples  that 
seem  to  us  especially  convincing  of  the  difficulties  associated  with  using  "over-differenced"  data.  At  the  same 
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time,  these  examples  are  of  particular  interest  to  macroeconomists.  The  first  and  third  examples  (estimating 
the  mean  and  testing  for  Granger  causality)  are  related  in  that  they  are  both  due  directly  to  the  spectral 
density  of  an  over-differenced  process  vanishing  at  frequency  zero.  Information  on  such  a  process  does  not 
accumulate  as  the  observed  sample  size  increases.  Our  second  example  cautions  that  first-differencing  to 
produce  stationarity,  no  matter  whether  the  original  data  are  difference-  or  trend-stationary,  is  by  no  means 
a  panacea  for  the  econometric  difficulties  confronting  researchers  when  they  analyze  trending  time  Beries. 
In  particular,  if  they  wish  to  avoid  the  nonstandard  distribution  theory  associated  with  analyzing  difference 
stationary  processes  in  levels,  they  should  realize  that  they  have  simply  re-created  those  difficulties  when 
they  take  first-differences  of  a  trend-stationary  sequence. 

4.  The  Effects  of  Least  Squares  Detrending. 

In  the  previous  two  sections,  we  used  the  convenient  fiction  that  detrending  involved  removing  a  fixed 
although  arbitrary  time  trend.  We  now  show  that  appropriate  versions  of  the  reasoning  above  apply  even 
when  the  detrending  line  is  estimated  by  regression. 

First,  we  establish  that  if  the  data  have  a  unit  root,  the  correlogram  estimated  from  detrended  data 
converges  pointwise  at  each  lag  to  the  correct  value  of  1.  This  may  initially  seem  surprising:  when  the  data 
generating  process  has  a  unit  root,  the  OLS  estimator  for  the  intercept  is  known  to  diverge  (see  for  instance 
Durlauf  and  Phillips  (1986)).  Thus  one  might  conjecture  that  the  detrended  residuals  have  no  desirable 
properties. 

However,  recall  that  the  Durbin- Watson  statistic  in  that  case  nevertheless  converges  in  probability  to  its 
correct  value  of  zero  (again,  see  Durlauf  and  Phillips  (1986)).  On  further  thought,  this  happens  because  the 
Durbin- Watson  statistic  involves  the  first  difference  of  the  detrended  data.  Thus  the  ill-behaved  estimator 
for  the  intercept  is  simply  subtracted  out,  and  its  lack  of  convergence  in  probability  is  inconsequential. 

Next,  the  estimator  for  the  time  trend  coefficient  only  converges  at  rate  yn,  whereas  it  gets  multiplied 
by  a  variable  (time)  growing  as  n,  so  that  the  error  from  using  estimated  rather  than  true  residuals  grows 
with  the  sample  size.  However,  again  recall  that  the  Durbin-Watson  statistic  involves  the  first  difference  of 
the  fitted  residuals.   Thus  it  is  only  the  first  difference  of  the  time  variable  that  is  relevant.   This  of  course 
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is  just  constant.  This  feature  controls  the  rate  at  which  the  estimation  error  grows,  and  in  particular,  that 
error  actually  converges  at  rate  y/n  in  probability. 

But  now  we  note  that  this  reasoning  applies  not  only  at  the  first  lag,  as  for  the  Durbin- Watson  statistic, 
but  in  fact  applies  at  every  fixed  lag.  Thus  even  in  the  difference  stationary  case  it  is  asymptotically  irrelevant 
that  we  use  estimated  rather  than  true  residuals  in  estimating  the  correlogram. 

For  the  trend  stationary  case,  the  estimators  for  the  intercept  and  time  trend  coefficients  converge  in 
probability  at  rates  n1/2  and  n3/2  respectively.  Thus  in  this  case,  it  is  straightforward  to  establish  that 
the  correlogram  estimator  using  estimated  residuals  converges  in  probability  to  the  correlogram  of  the  true 
residuals.  From  the  discussion  above,  and  the  formal  results  below,  we  are  able  to  establish  that  this 
statement  extends  to  difference  stationary  data  as  well.  Note  that  by  contrast  with  trend-stationary  data, 
in  the  difference-stationary  case,  the  estimator  for  the  intercept  diverges  at  rate  n  '  ,  and  the  estimator  for 
the  time  trend  coefficient  converges  only  at  the  slower  rate  n1'2. 

Second,  we  turn  to  behavior  in  finite  samples.  Nelson  and  Kang  (1981)  have  argued  that  the  use  of 
least  squares  detrended  data,  when  in  fact  the  data  are  difference  stationary,  results  in  estimators  for  the 
covariogram  and  the  spectral  density  that  are  biased  towards  showing  cyclically  in  finite  samples.  We 
interpret  this  as  saying  that  incorrect  econometric  specification  will  lead  to  an  incorrect  conclusion.  In  our 
view,  this  argument  carries  weight  only  if  correct  specification  in  this  context  does  not  lead  to  that  same 
erroneous  conclusion.  Suppose  on  the  other  hand,  that  the  true  data  generating  mechanism  is  stationary 
about  trend.  Then,  the  "best  practice"  procedure  is  to  detrend  by  least  squares:  it  is  irrelevant  both 
asymptotically  and  in  finite  samples  whether  one  estimates  the  residual  serial  correlation  simultaneously 
with  the  trend  line,  or  with  a  two-step  procedure  by  first  detrending.  In  this  situation,  would  a  researcher 
following  "best  practice"  similarly  discover  spurious  cyclically?  The  answer  is  yes. 

If  the  true  data  generating  mechanism  were  white  noise  about  deterministic  trend,  the  fitted  residuals 
will  necessarily  be  serially  correlated.  This  follows  directly  from  the  properties  of  BLUS  residuals  (see  for 
example  introductory  textbooks  such  as  Theil  (1971)).  We  show  below  that  for  every  finite  sample,  at  any 
fixed  lag,  the  exact  bias  in  the  covariogram  estimator  is  "continuous"  in  the  serial  persistence  of  the  data 
generating  mechanism,  in  a  neighborhood  containing  unit  root  processes.  Loosely  speaking,  in  finite  samples, 
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there  is  certainly  a  nonzero  bias  in  the  estimated  dynamics  of  an  inappropriately  detrended  unit  root  process; 
however,  this  bias  is  only  of  the  same  order  of  magnitude  as  the  bias  when  trend  stationary  processes  are 
correctly  detrended. 

One  possible  conclusion  from  this  is  that  the  message  due  to  Nelson  and  Kang  (1981)  applies  to  persistent 
trend-stationary  models  as  well  as  to  difference-stationary  models:  researchers  will  just  always  find  spurious 
cyclicality,  regardless  of  the  true  model.  We  think  this  is  a  little  extreme:  our  preferred  interpretation  instead 
is  that  finite  sample  theory  does  not  EUggest  that  detrending  is  a  bad  thing  to  do.  Putting  this  together  with 
our  results  for  the  asymptotic  behavior  of  the  detrended  residuals,  we  conclude  that  there  is  no  evidence  that 
detrending  produces  misleading  results  when  the  true  model  has  a  unit  root.  Of  course  using  the  correct 
econometric  specification  is  always  better  than  using  an  incorrect  specification,  but  we  do  not  think  that  is 
the  issue  here. 

Some  economists  have  suggested  to  us  that  "if  you  just  look  at  the  picture  of  fitting  a  trend  line  to  a 
unit  root  process,  you'll  see  clearly  why  there  is  spurious  cyclicality;  the  end-points  of  both  the  trend  and 
the  unit  root  process  are  made  to  come  close  to  each  other.  Thus  the  detrended  data  are  made  to  look 
cyclical.''  While  there  is  certainly  something  to  this  graphical  intuition,  its  effects  show  up  nowhere  in  our 
calculations  below.  Consequently,  we  believe  this  intuition  to  be  incorrect,  and  we  do  not  find  these  "look 
at  the  picture" -type  arguments  persuasive. 

To  begin  the  formal  analysis,  we  impose  some  standard  assumptions  on  the  heterogeneity  and  serial 
dependence  permitted  in  our  data.  Let  {u^}^  be  the  disturbance  identified  above,  and  define  the  partial 
sum  sequence:  So  =  0  ,  St  =  2J,-=1  uj-  ^'e  impose  the  following  conditions. 

Assumption  4.1  (Regularity):   Let  u  satisfy: 

(a.)  £ut  =  0     for  all  J  ; 

(b.)  Eu\  =  al  >  0     for  all  t ; 

(c.)  supt  £'|ut|2+*  <  oo     for  some  6  >  0  ; 

(d.)  a%  =  lim^oo  E  (n~1S%)  exists,  0  <  c%  <  00  ; 
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(e.)    {u,}^.j  is  strong  mixing,  with  mixing  coefficients  am  such  that: 


oo 

l-2/« 


£«* 


<  oo. 

m=l 


We  use  these  conditions  as  they  are  by  now  relatively  familiar  in  the  literature:  they  are  not  the  weakest 
conditions  possible  to  obtain  our  results.  The  reader  is  referred  to  Phillips  (1987)  and  Phillips  and  Perron 
(1988)  for  further  discussion  of  these  conditions.  Finite  order  covariance  stationary  ARMA  process  with 
Gaussian  disturbances  (where  the  moving  average  part  does  not  have  a  unit  root)  can  be  shown  to  satisfy 
these  assumptions. 

We  will  use  the  following  result  repeatedly  in  the  subsequent  discussion: 

Lemma  4.2  (Asymptotic  Distributions):   Assume  the  conditions  of  Assumption  4.1.  Then  as  n  — ♦  oo: 

(a-)  n-1/2X:t,LiUt^«ToA'(0,l); 

(b.)   n-^E.l^^ac/^'Mdr; 

(e.)  n-3/=  Etn=i  *«t  =*  *o  (W(l)  -  H  W(r)  dr); 

where  A'(0, 1)  is  the  standard  normal,  W  is  standard  Brownian  motion,  and  =>  denotes  weak  convergence. 
Our  asymptotic  results  are  expressed  in  the  following: 

Theorem  4.8  (Consistency):  Let  {Yt,t  =  l,2,...,n}  be  an  observed  sample;  let  Ut(n)  be  the  fitted 
residuals  from  an  OLS  regression  on  a  constant  and  time  trend. 

(a..)  IfY-t  is  trend-stationary;  then  for  each  fixed  lag  j,  as  n  — »  oc: 

1       -  1      " 

—     >       uj(n)u-_,-(Ti) )       ut(n)ut-j[n)  —  0     in  probability. 

n    - — '  J  n    ' — ' 

t=j+i  t=y+i 

(b.)  IfYt  is  difference  stationary,  then  for  each  fixed  lag  j,  as  n  — >  oo: 


E,"=3Ti"'(")"'-;(")       in-  u  k-r* 
v^t: — r~) — r^ 1  —  0     m  probability. 


Et=iUtL" 
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Thus  in  either  case,  detrending  does  not  affect  convergence  in  probability  of  the  correlogram  estimator  to 
the  correct  values.  While  this  result  is  reassuring,  and  initially  surprising  for  the  reasons  articulated  above, 
it  is  nevertheless  still  an  asymptotic  statement.    It  may  be  that  in  finite  samples,  detrending  a  unit  root 
process  is  severely  misleading. 

We  therefore  establish  a  continuity  proposition:  If  the  data  are  trend-stationary,  with  the  true  distur- 
bances about  trend  highly  correlated,  then  the  finite  sample  bias  in  using  detrended  data  is  of  the  same 
order  of  magnitude  as  the  finite  sample  bias  when  the  data  are  in  fact  difference  stationary. 

We  redefine  the  model  somewhat  for  convenience:  Let 

Yt  =  Qo  +  A>*  +  £«, 

where  et  wiU  either  be  covariance  stationary,  or  be  generated  as  a  unit  root  sequence  with  zero  drift. 

Theorem  4.4  (Finite  Sample  Bias):   Let  u0  be  a  random  variable,  and  Jet  {u(}t>1  be  uncorreiated  with 
uo  and  satisfy  Assumption  4.1;  let  the  disturbance  {ft}t>!  be  generated  by  two  alternative  modeb: 

(a.)  ext  =  AeM_j  +  ut,  t  >  1,  |A|  <  1,  £A0  =  "o; 
(b.)  eu  -  ei,t-i  +  ut,  t>  1,  £10  =  u0. 

Given  a  sampJe  of  size  n  >  3,  Jet  ?At("),  «it(n),  t  =  l,2,...,n,  denote  the  detrended  data  (fitted 
residuals).  For  each  j  =  0, 1, . . .  ,  n  —  1,  the  exact  biases  in  the  covariogram  estimator  are: 


Bx{j,-n).=  E 


(»-j)    l    13    ht[n)ix.t-:in)  -  [n-j)    1    ]P    eAt(n)eA,t_y(n) 


t=j+i 


and 


*i(j»  =E 


[n-j]    :    Yi    eit[n)ei,t-j[n)  -  [n-j)    1    ^    £"(n)ei-'-j(n) 


t=j+i 


t=y+i 


Then  for  each  £xed  n  >  3,  for  each  fixed  j  =  0, 1, . . . ,  n  —  1: 


Bx  [j,  n)  -»  Bx  {j,  n)     asA-t  1. 
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Case  (b.)    in  the  Theorem  specifies  a  zero  drift  unit  root  process;  however  since  the  true  parameter  /5q  is 
arbitrary,  our  discussion  certainly  covers  unit  root  processes  with  nonzero  drift. 

Theorem  4.4  states  the  following:  the  bias  that  arises  from  using  estimated  residuals  (detrended  by 
least  squares)  is  of  the  same  order  of  magnitude,  independent  of  whether  the  underlying  data  are  difference- 
stationary  or  trend-stationary. 

To  summarize  the  results  of  this  section,  while  there  are  clearly  differences  in  the  behavior  of  least 
squares  estimators  of  trend  coefficients  across  trend-  and  difference-stationary  data,  the  resulting  detrended 
data  have  similarly  revealing  properties  for  their  true  underlying  dynamics. 

5.   Conclusion. 

The  specific  technical  contribution  of  this  paper  is  two-fold:  First  we  have  shown  that  the  correlogram 
estimators  using  least  squares  detrended  data  are  consistent  for  the  true  values  of  the  correlogram,  regardless 
of  whether  the  data  are  actually  trend-  or  difference-stationary.  Second,  we  have  established  that  the  finite 
sample  bias  in  "cyclically"  that  results  from  detrending  difference-stationary  data  is  no  worse  than  that  in 
the  best  practice  case  of  detrending  trend-stationary  data. 

Our  conclusions  draw  on  both  exact  as  well  as  asymptotic  arguments.  This  is  in  keeping  with  the 
reasoning  that  has  motivated  the  erroneous  conclusions  we  list  in  the  Introduction.  We  observe  that  quite 
a  number  of  applied  workers  have  adopted  these  incorrect  statements  in  their  own  research.  We  emphasize 
that: 

(i)  removing  arbitrary  fixed  linear  time  trends  actually  preserves  the  true  properties  of  the  time  series  data, 
regardless  of  whether  the  true  model  is  difference  stationary  or  trend  stationary, 

(ii)  taking  first  differences,  if  the  true  model  is  in  fact  trend  stationary,  produces  data  that  are  econometri- 
cally  useless,  and, 

(iii)   least  squares  detrending  does  not  disguise  the  cyclical  properties  of  the  data. 

In  our  view,  the  discussion  here  overturns  the  conventional  wisdom  among  many  applied  researchers. 
Our  results  contradict  the  observation  that  incorrect  detrending  distorts  the  statistical  properties  of  the  data 
and  produces  "spurious  cyclicality,"  and  warn  against  the  undiscriminating  use  of  first  differencing:   First 
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differencing  is  not  a  procedure  we  would  recommend  to  researchers  confronted  with  trending  time  series 
data. 
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Appendix. 


Proof  of  Lemma  4.2:   Part  (a..)  is  the  usual  central  limit  result,  see  e.g.   White  (1984)  Theorem  5.19.  The 
remainder  is  simply  Lemma  2.3  in  Phillips  and  Perron  (1988).     Q.E.D. 

To  prove  the  main  Theorems,  it  is  convenient  to  consider  an  alternative  regression  model: 


Yt  =  0oi  +  #02  (*  — 


n  +  1 


)  +  et  =  zte0  +  £t, 


where  Zt  =  (l  t  —  ^y^).  This  alters  the  original  specification  of  #oi  to  the  extent  that  #02  is  different  from 
zero,  but  it  is  the  specification  that  most  researchers  have  used,  see  for  example  Nelson  and  Kang  (1981), 
Durlauf  and  Phillips  (1986),  Phillips  and  Perron  (1988),  and  others.  Also,  it  is  extremely  convenient,  and 
the  modification  is  inconsequential  for  studying  the  regression  residuals.  The  alternative  assumptions  made 
in  the  text  regarding  the  disturbance  e  are  taken  to  apply  to  the  disturbances  of  this  equation,  and  detrended 
data  refers  to  the  fitted  residuals  of  this  equation. 

We  abuse  notation  and  call  processes  covariance  stationary  if  they  satisfy  Assumption  4.1.  Similarly  call 
processes  difference  stationary  if  their  first  differences  satisfy  Assumption  4.1.  We  first  establish  a  Lemma; 
this  comprises  known  results,  but  is  easy,  and  we  place  it  here  for  completeness. 

Lemma  A.    Let  6n  =  \6ni,6n2j    denote  the  OLS  estimator  for  80. 
(a.)  When  c  is  covariance  stationary, 

n3/2  (en2  -  0O2)  =*  6a0    W(l)  -  2  J     W(r)  dr    . 
(b.)   When  e  is  difference  stationary, 

n"l/2  (§nl  -  Soi)  =>°oJ    W(r)dr, 

n1/2  {en7  -  0O2)  =>  teo    2  f    rW[r)  dr  -  J    W[r)  dr 
Proof  of  Lemma  A.   By  the  usual  OLS  formula, 


L-e0=\Yl Z'tZt  J        Y^ Z'tt 


Let 

[Di{n]     D2(n))  = 

Rewrite  the  above  as: 


Gn  —  6o  = 


Di{n) 

0 


which  implies  that: 


Di(n) 
0 


0      in8. 


(n    1'2     n-3'2)      if  e  is  covariance  stationary; 
(n~3/2     n~hl2)     if  e  is  difference  stationary. 


^2(n)n3 


-i 


±D2{n)n3J  V 


/    \-n         -0;    —    i      r~\     r\  V^n         I  j  '1-1      ,       I 


}       \D2(n)JZ=1(t-^)et) 


(a..)   When  e  is  covariance  sta.tiona.ry: 


D1(n)J2et=n1^J2et^  a0M(0,\); 


t=i 


!)• 


Thus: 


w  E  (* -  ^  *  -  «_3/2  E  *  -  r"1/2  X>  =*  ff°  (*w  -  jf  w 

n1/2(Li-8oi)  =*aoJ/(0,l); 
n3/2  (*„2  -  0O2)  =►  6a0    W(l)-it    W[r)dr    . 


(r)dr--W(l] 


(b.)   When  e  is  difference  stationary: 


Dl{n)  f\,  =  n-3/2^^.  =►  *o  f    W(r)  dr; 
t=i  t=i  Jo 

t=i  V  "     /  1=1  z  .=i  LJo  J      -      Uo 


Thus: 


n-1/2(Li-8m)^voJ    W{r)dr; 
n1/2  (f»a  -  <?o2)  =>  6a0    2  /"    rW'(r)  dr-  /"    W[r)  dr 


This  establishes  the  Lemma.     Q.E.D. 

Part  (b.)  is  due  to  Durlauf  and  Phillips  (1986);  we  have  not  been  able  to  find  a  reference  for  part  (a.), 
although  it  seems  to  be  relatively  well-known. 

Proof  of  Theorem  4. S  (Consistency):   By  the  definition, 

.;o  that: 

et(n)£,_y(n)  =  <:,€,_.,  +  Zt  (in  -  80j  (§n  -  S0j  Z[  -  {§„  -  60)    [Z[et-3-  -  K-pt] 

=  UU-3  -r  tr    I  p„  -  o0j  \vn  -  voj  ZtZt    -  i  vn  -  u0j    [2t£t-j  -  ^t-/€»J 


(a.)  Suppose  e  is  covariance  stationary.  Rewriting  the  above, 

y^£,(n)£t_y(n)  -£%tew  =  tr    (§n  -  60)  [8„  -  60J 


n        0 
0      ^n3 


But  by  Lemma  A,  6nl  -  801  =  Or(n-a/2),  and  6ni  -  6Q7  =  Or(n-3/2),  which  imply  that  \6nl  -  60 
=  Op(n-1),  and  fln2  -  602)     =  Or{n~3).    Thus  the  £rst  term  on  the  left  hand  side  is  Op(l)  +  0,,{1) 


=  0;1(l).  Further,  by  Lemma  4.1,  £t  et  =  Or(n1/'2)  and  £2t  tet  =  0,,(n3/2),  so  that  the  second  term  on 
the  left  hand  side  is  again  0,,{\)  +  0,,(l)  =  0,,(l).   Thus  the  right  hand  side  satisfies: 


that 


—  2     £t(i)?t-y(i) 2^£f£,_y  — »  0     as  n  — »  oo. 


(b.J  Suppose  t  is  difference  stationary.  Write: 


£?=/+!  M")<«-yW  _     ^  Er=y+i  «t(»)  l?«-y(n)  -  ?iW] 


Without  Joss  of  generality,  suppose  j  >  0.  Then: 

£,_y(n)  -  £«(n)  =  (et-y  -  £t)  +  (Z.  -  £t-y)  («»  -  *o) 
=  -  £  u(_fc  +  (0    j)  [e„  -  60j  . 


k=0 


Therefore: 


£t(n)  [et.^n)  -  tt{n)}  =  [£l  -  Z,  (X  -  0O)] 


y-i 
-EU(-fc 

k=0 


=  -  JT  m-xtt  +  (§n  -  S0)'  ( °)  u  +  (§n  -6o)'j2  Z'tut-t 
k=0  '  ^    '  fc=o 

-zt(en-So)  (L-6o)'  rX 


which  implies: 


J2  U[n)  [?t-y(n]  -  lt(n)}  =  -  ^E  U<-fc£«  +  {*»  ~  *")'  Q)  5>  +  £  (*»  "  6°)' 


fc=0     t  v"  '       t  fc=0 


JVow  apply  Lemma  4.1  and  Lemma  A  repeatedly: 

y^  u,_fc£t  =  y^ u,_fc  £t-k-i + 22 Ut-' ) 

t  t  \  1=0  J 

k 
=  ^  Ut-*«t-fc-l  T  X]  H  U(_fcUt_i  =   Op(tl), 

('»  -  eo)'  (°)  £ £<  =  0»(»~1/2)  •  °»(»3/2)  =  °p(n). 

fa,  -«o)Y  TUl-fc  ")  -  O.fn1/2)  •  0„(n^)  +  Op(n-V>)  ■  Op{n*")  =  0„(n), 

5582    093 


and  finally, 

(L  -  Bo)'  (f\  (§n  -  h)'^2Zi  =  Or{n~112)  ■  \0r{n-ll2)0r{n)  +  0p{rTll*)09[n* 

=  0,,(n). 

Thus  the  numerator 

£?t(n)(ft_y(n)-?t(n))  =  0,,(n). 
t 

To  complete  the  proof,   we  now  establish  the  asymptotic  probability  order  of  the  denominator.     By 
Lemma  4.1  and  Lemma  A,  -V  2,  Mn)2  converges  weakly  to  Borne  functional  of  W.   In  particular,  the 

limiting  random  variable  takes  on  the  value  0  with  probability  0.   Therefore  (^j-^2,  tt{n)7)       is  0,,(l). 
Consequently,  as  n  — »  oo: 

EILy+i  «iW«t-yW  ..... 
^n s 1  — »  0         in  probability. 

E,=i  <r 

This  establishes  the  Theorem.     Q.E.D. 

As  remarked  in  the  text,  the  divergent  first  entry  of  6n  -  60  in  part  (b.)     is  multiplied  by  zero  at 

crucial  points  in  evaluating  (§„  -  60j     (    •  )  £t  «t  and  (§„  -  60  J     (    .  J  •  (§n  -  60j  £f  Z[.  Otherwise,  the 

numerator  would  be  Oj,(n7),  and  given  the  asymptotic  probability  order  of  the  denominator,  the  estimator 
would  fail  to  converge  in  probability. 

Proof  of  Theorem  4.4  (Finite  Sample  Bias):   In  either  case,  the  cross  product  is 


-  z[  (en  -  fl0)  u-i  -  z[_i  (en  -  e0)  et. 

Thus  in  either  case  the  exact  bias  in  the  covariogram  estimator  is 


n  r  f 

B{j,n)  =  {n-j)-1    Y,    ZtE    (§n  -  0O)  (k  -  *o) 


i=i+i 


K 


-(n-j)-1    E    (^[(«»-«o)«t-y]+^-y^[(^»-«o)«t] 
t=J+l 

Consider  case  (a.j.  B_v  iterating,  e^i  =  Al£o  +  T^fc=0  A^Ut—^,  whicii  implies  /or  aii  s,t: 

[■/  ,-i  \    /  <-i  ^1 

Ee^rtx:  -  E  I     A't0+  2J  ^fc"*-fc      I  ^£c  -r  /J  A' 

LV  fc=o  /    V  j=o 


U,_/ 


r-i  t-i 


=  )'+,£(20  +  ^^)t+'r(vrt-,) 


k=0  1  =  0 


Define  the  function: 


def 


/(s,t,A)  =  Eex,ext. 
Next  consider  case  (b.).  By  iterating,  en  —  eo  ■+  12h=o  ut-k,  so  that  for  all  s,  t: 

»-i t-i 


Eeuelt  =Ec-c^y2^TE  (u,_fcut_,) . 


K=0 1=0 


g(s,t)  =  Eeuelt. 


Define  the  function: 
Then  in  case  fa. J: 

E[(L-60)ext]  =  [JZZ'.ZA       [Tz'.E^.^  =  [j^KZA       (X>.'/M,A) 


\t=i 


v,«=i 


\«=i 


v«=i 


V«=l  t=l 


v«=i 


Similarly,  in  case  ^t>.^/ 


and 


Therefore: 


[en-e0)elt]  =  \Tz'tzA     \Tz',g{s,t)\ , 


V«  =  l 


V«  =  l  t=l 


V«  =  l 


-1 


v«=l 


v«  =  l 


\»=1 


Tiie  expression  /or  5a  is  identical  with  g  in  place  of  f.  The  difference  in  bias  is  then: 
BxV,n)-Bi[j,n) 

=  {n-3V1  E  U[EW     fEE^t(/(-,t,A)-p(«,t)))(f;z;z.)    z;,,. 

t=y+i   [       \«=i  /        V»=it=i  /    \»=i  / 

-Zt(f\Z',zA       [J2Z',(f{s,t-j,\)-g(s,t-j))) 


-  z<-:  E  « )      E  z.'c/(^  £> A)  -  -°(£>  0) 

V-=i        j      V«=i  / 

For  eacn  fixed  s,t,  as  A  — »  1,  we  nave  /(s,i,  A)  -  g[s,t)  — »  0.  Further,  Bx{j,n)  -  J5i(j,  n)  is  evidentJy 
continuous  in  /(s,  t,  A)  —  p(s,  t).  Therefore,  we  see  immediately  that  as  A  — •  1,  B\  [j,  n)  —  Bi  [j,  n)  — ►  0.  Tiiis 
establishes  the  Theorem.      Q.E.D. 
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